Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements for people parsing email headers #505

Merged
merged 6 commits into from
May 14, 2013

Conversation

ConradIrwin
Copy link
Collaborator

The main change here is to defer as much header-parsing work as possible. This gives me a 2000x speedup for an (admittedly pathological, but real) example when combined with Pull #490

This email was sent around our company: https://gist.github.com/5901bbd810c08ed3d0b1, master currently takes 6.9s to parse those headers!

Pathological example:

 master: 6.9s
 bpot/ragel: 0.3s
 lazy-fields: 0.008s
 lazy-fields + bpot/ragel: 0.003s

(The "best case" would be as fast as my hacky zippy_from function, which is 0.0005s so still 7 times faster than I managed to make this. Most of the remaining time is spent in .to_crlf and .split, see profiler output, so I'm pretty content to leave it as-is)

rspec on ruby-1.9.3:

 master: 19s
 bpot/ragel: 5.6s
 lazy-fields:  11s
 lazy-fields + bpot/ragel: 4.2s

(Other rubies had similar shapes, but different actual numbers)

This is a substantial performance improvement for people who are parsing
emails but do not need access to all the headers.

In particular I am doing: Mail::Header.new(<str>)['From'].to_s
on this example: https://gist.github.com/5901bbd810c08ed3d0b1
This corresponds to about a 10% time-saving when running the specs with
bpot/ragel merged, and a bigger performance improvement for parsing
workloads.
This has little effect on the specs, but on my header reading example it
makes about a 10x performance difference, finally bringing it within one
order of magnitude of the "fast hacky solution" at
https://gist.github.com/5901bbd810c08ed3d0b1
@ConradIrwin
Copy link
Collaborator Author

Ok, I wouldn't normally add a new commit to the same branch, but I got a bit carried away!

If we binary search of Mail::FieldList#<< then we halve the time to run the specs again :). so now:

rspec on ruby-1.9.3:

master: 19s
bpot/ragel: 5.6s
lazy-fields: 6.1s
lazy-fields + bpot/ragel: 2.2s <-- yes, 10x the speed of master!

pathological example:

master: 6.9s
bpot/ragel: 0.3s
lazy-fields: 0.004s
lazy-fiels + bpot/ragel: 0.0015s <-- yes 4600x the speed of master!

@bpot
Copy link
Contributor

bpot commented Feb 4, 2013

Cool! Mail::FieldList#<< was the next thing I was going to look at. Binary search is probably what we want to do but we need to ensure that it preserves the order of headers that appear multiple times: 4c0962d

@ConradIrwin
Copy link
Collaborator Author

@bpot This happens implicitly as I'm using insort_right (insert at the point at which all items to the right are strictly greater than the new header). If I change it to use insort_left (insert at the point at which all items to the left are strictly less than the new header) then the header-reordering specs fail.

See http://hg.python.org/cpython/file/2.7/Lib/bisect.py for the original code and http://docs.python.org/2/library/bisect.html for the docs.

@bpot
Copy link
Contributor

bpot commented Feb 4, 2013

Ah cool!

Mail::Multibyte is not defined in that case, so we just force everything
to be a string.
@jeremy
Copy link
Collaborator

jeremy commented Feb 10, 2013

Sweeet.

@mikel
Copy link
Owner

mikel commented May 14, 2013

Hey there, thanks for this, looks great :)

When I run the specs based against master, I am getting a LOT of warnings:

/Users/mikel/Code/mail/lib/mail/field.rb:137: warning: instance variable @raw_value not initialized
/Users/mikel/Code/mail/lib/mail/field.rb:137: warning: instance variable @value not initialized

Can you investigate and fix? If so, I'll get this merged in ASAP.

@ConradIrwin
Copy link
Collaborator Author

@mikel should now be fixed. Thanks :).

@mikel mikel merged commit 0f7e762 into mikel:master May 14, 2013
@mikel
Copy link
Owner

mikel commented May 14, 2013

DONE :) Merged into Master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants